15 research outputs found

    Application and Development of Computational Methods for Ligand-Based Virtual Screening

    Get PDF
    The detection of novel active compounds that are able to modulate the biological function of a target is the primary goal of drug discovery. Different screening methods are available to identify hit compounds having the desired bioactivity in a large collection of molecules. As a computational method, virtual screening (VS) is used to search compound libraries in silico and identify those compounds that are likely to exhibit a specific activity. Ligand-based virtual screening (LBVS) is a subdiscipline that uses the information of one or more known active compounds in order to identify new hit compounds. Different LBVS methods exist, e.g. similarity searching and support vector machines (SVMs). In order to enable the application of these computational approaches, compounds have to be described numerically. Fingerprints derived from the two-dimensional compound structure, called 2D fingerprints, are among the most popular molecular descriptors available. This thesis covers the usage of 2D fingerprints in the context of LBVS. The first part focuses on a detailed analysis of 2D fingerprints. Their performance range against a wide range of pharmaceutical targets is globally estimated through fingerprint-based similarity searching. Additionally, mechanisms by which fingerprints are capable of detecting structurally diverse active compounds are identified. For this purpose, two different feature selection methods are applied to find those fingerprint features that are most relevant for the active compounds and distinguish them from other compounds. Then, 2D fingerprints are used in SVM calculations. The SVM methodology provides several opportunities to include additional information about the compounds in order to direct LBVS search calculations. In a first step, a variant of the SVM approach is applied to the multi-class prediction problem involving compounds that are active against several related targets. SVM linear combination is used to recover compounds with desired activity profiles and deprioritize compounds with other activities. Then, the SVM methodology is adopted for potency-directed VS. Compound potency is incorporated into the SVM approach through potencyoriented SVM linear combination and kernel function design to direct search calculations to the preferential detection of potent hit compounds. Next, SVM calculations are applied to address an intrinsic limitation of similarity-based methods, i.e., the presence of similar compounds having large differences in their potency. An especially designed SVM approach is introduced to predict compound pairs forming such activity cliffs. Finally, the impact of different training sets on the recall performance of SVM-based VS is analyzed and caveats are identified

    Comparison of Confirmed Inactive and Randomly Selected Compounds as Negative Training Examples in Support Vector Machine-Based Virtual Screening

    No full text
    The choice of negative training data for machine learning is a little explored issue in chemoinformatics. In this study, the influence of alternative sets of negative training data and different background databases on support vector machine (SVM) modeling and virtual screening has been investigated. Target-directed SVM models have been derived on the basis of differently composed training sets containing confirmed inactive molecules or randomly selected database compounds as negative training instances. These models were then applied to search background databases consisting of biological screening data or randomly assembled compounds for available hits. Negative training data were found to systematically influence compound recall in virtual screening. In addition, different background databases had a strong influence on the search results. Our findings also indicated that typical benchmark settings lead to an overestimation of SVM-based virtual screening performance compared to search conditions that are more relevant for practical applications

    Prediction of Compounds in Different Local SAR Environments using ECP

    No full text
    SD files of 15 data sets reported in the manuscript are uploaded. Each data set is represented by its CHEMBL Target ID. The file format is provided in the file 'description.txt'

    Compound Pathway Model To Capture SAR Progression: Comparison of Activity Cliff-Dependent and -Independent Pathways

    No full text
    A compound pathway model is introduced to monitor SAR progression in compound data sets. Pathways are formed by sequences of structurally analogous compounds with stepwise increasing potency that ultimately yield highly potent compounds. Hence, the model was designed to mimic compound optimization efforts. Different pathway categories were defined. Pathways originating from any active compound in a data set were systematically identified including compounds forming activity cliffs. The relative frequency of activity cliff-dependent and -independent pathways was determined and compared. In 23 of 39 different compound data sets that qualified for our analysis, significant differences in the relative frequency of activity cliff-dependent and -independent pathways were observed. In 17 of these 23 data sets, activity cliff-dependent pathways occurred with higher relative frequency than cliff-independent pathways. In addition, pathways originating from the majority of activity cliff compounds displayed desired SAR progression, reflecting SAR information gain associated with activity cliffs

    Computational polypharmacology analysis of the heat shock protein 90 interactome

    No full text
    The design of a single drug molecule that is able to simultaneously and specifically interact with multiple biological targets is gaining major consideration in drug discovery. However, the rational design of drugs with a desired polypharmacology profile is still a challenging task, especially when these targets are distantly related or unrelated. In this work, we present a computational approach aimed at the identification of suitable target combinations for multitarget drug design within an ensemble of biologically relevant proteins. The target selection relies on the analysis of activity annotations present in molecular databases and on ligand-based virtual screening. A few target combinations were also inspected with structure-based methods to demonstrate that the identified dual-activity compounds are able to bind target combinations characterized by remote binding site similarities. Our approach was applied to the heat shock protein 90 (Hsp90) interactome, which contains several targets of key importance in cancer. Promising target combinations were identified, providing a basis for the computational design of compounds with dual activity. The approach may be used on any ensemble of proteins of interest for which known inhibitors are available

    Prediction of Compounds in Different Local Structure–Activity Relationship Environments Using Emerging Chemical Patterns

    No full text
    Active compounds can participate in different local structure–activity relationship (SAR) environments and introduce different degrees of local SAR discontinuity, depending on their structural and potency relationships in data sets. Such SAR features have thus far mostly been analyzed using descriptive approaches, in particular, on the basis of activity landscape modeling. However, compounds in different local SAR environments have not yet been predicted. Herein, we adapt the emerging chemical patterns (ECP) method, a machine learning approach for compound classification, to systematically predict compounds with different local SAR characteristics. ECP analysis is shown to accurately assign many compounds to different local SAR environments across a variety of activity classes covering the entire range of observed local SARs. Control calculations using random forests and multiclass support vector machines were carried out and a variety of statistical performance measures were applied. In all instances, ECP calculations yielded comparable or better performance than controls. The approach presented herein can be applied to predict compounds that complement local SARs or prioritize compounds with different SAR characteristics
    corecore